Comparing the Men and Women of Philosophy

INTRO

Women philosophers weren't around for a long time (published, at least). A majority of the most famous philosophy texts that exist are written by men. In this report, I will take a look at the philosophical texts of women and how they compare to men in terms of sentiment, and content overlap.

DATA & METHODOLOGIES

The main dataset used for this analysis can be found at https://www.kaggle.com/kouroshalizadeh/history-of-philosophy. It contains over 300,000 sentences from over 50 texts spanning 10 major schools of philosophy. The represented schools are: Plato, Aristotle, Rationalism, Empiricism, German Idealism, Communism, Capitalism, Phenomenology, Continental Philosophy, and Analytic Philosophy.

Two other datasets that I source for this analysis are from Wikipedia and google search results. For Wikipedia, I create a dataset that contains each author's Wikipedia bio page. (Example for Simone de Beauvior: https://en.wikipedia.org/wiki/Simone_de_Beauvoir). The expectation is that author's wikipedia pages will be an objective source on authors' lives regardless of sex. The google dataset contains the content of the first 10 websites that appear from a search (Example: "philosopher Beauvior review"). I expect the google dataset to be the most subjective since we are searching for reviews and the authors' opinions will be clear.

Methodologies used in this report include web scraping, sentiment analysis, and word clouds. Web scraping is the process of using automated code to extract content and data from a website, this is used to collect the google data. Sentiment analysis is the process of computationally identifying and categorizing opinions expressed in a piece of text. They are typically categorized as positive, negative, or neutral and can be used to determin the writer's attitude towards a particular topic. A word cloud is is a collection of words visualized in different sizes. The bigger and bolder the word appears, the more often it's mentioned within a text.

Data cleaning methodologies used are removing special characters, removing stopwords, and stemming. This same methodology is applied to all datasets. Removing special characters does exactly as it says - removes special characters from text. Stopwords are the English words which does not add much meaning to a sentence. They can safely be ignored without sacrificing the meaning of the sentence. Stemming is the process of reducing a word to its word stem that affixes to suffixes and prefixes or to the roots of words.

Further technical and specific explanations of these methodologies and their applications to this report can be found in ../lib/README.md.

ANALYSES

Describing the data

The data contain 13 schools of philosophy, they are:

There is far more data for men than women. There are only 3 female philosophers in the data and there are 32 men authors. The men make up about 95% of the data. Also, the male authors collectively have works in all but one school - feminism. The female authors only write in school - feminism.

Below is a timeline of each authors first publication separated by sex.

We can see that women came along to the world of philosophy (in this dataset) much later than men. Men have philiosophers dating all the way back to 350 BC, Plato. Although men have published work much earlier than women, only 4 authors have works published before the 1600s. Most men have publish dates after the 1600s and a lot between after the 1900s. All women's work was published after the 1750s.

Sentiment analysis

Below is a vizualation of the average sentiment scores by sex for each data source (philosphy text data, wikipedia bio data, and google search data). We are interested in whether one data source or sex has a dramatically different sentiment score compared to others.

First, we take a look at sentiment scores from the philosophy data (figure above). These scores will reflect the average authors' sentiment in their own works.

We can see that women have slightly less positive scores compared to men. Though, in general, everything in this chart is pretty neutral.

Next, we take a look at the sentiment scores for the wikipedia data. We expect this source to be the most objective since it is just a biography.

It is very interesting that the simple and vader sentiment scores are much lower for females as compared to males. We expected the wikipedia biographies to be the most objective data source of all so it is interesting to see this large difference. This can be due to the fact that the men's biogrpahies have more positive words such as "pioneer", since they are relatively more foundational to the subject of philosophy compared to women.

Lastly, we take a look at sentiment scores from the google search data (below). This data should logically be the most subjective of them all since we searched google for reviews. The scores will probably reflect the author's opinion on the article.

The vader sentiment score is about the same for men and women. In the textblob and simple sentiment scores the women have slightly more positive scores than men. This leads us to believe that women's work in philosophy is just as (or more) positively received as men's work.

Word clouds

Word clouds are a great tool to see which words are most popular in texts. The larger words are used more commonly. Word clouds are like the stars - the longer you look, the more you'll see!

We first compare the word clouds of the philosophy data for men and women.

It is interesting to see that in the males' word cloud there are much more objective words such as "object", "reason" and firm words such as "must". The women's word cloud has more subjective words such as "situation" and soft words such as "lover".

Next we look at word clouds for the authors' wikipedia pages below.

These word clouds look much different compared to the philosophy data word clouds. Both word clouds have words that you'd typically see in a biography: countries of origin, "first" if the author is doing something novel or their first work, "book", author names, etc.

The google word clouds look pretty similar to the wikipedia word clouds.

venn word clouds

Lastly, we'll take a look at the same word clouds but in venn diagrams. It will be interesting to see the overlap between sexes by source.

First, we'll look at the philosophy data.

It looks like there's a lot of women's words are found in men's text but the opposite is not true. One word that appears only in women's texts is "statistic", which is very interesting. This can lead us to believe that females rely on numbers and statistics in their texts more than men.

Next, we'll look at the wikipedia data.

In the wikipedia data we see the same rleationship as in the philosophy data - there's a lot of women's words are found in men's text but the opposite is not true. In this we see some words that are in only females pages and not mens' that we didn't see before (or as big) such as "motherhood", "fuitility". New words appearing a lot (large and in the middle) compared to the philosophy data are "feminisim". It is interesting that we don't see feminism overlap in the philosophy data above but we see it overlap here in the wikipedia pages. A word that we see in the mens' wikipedia pages but not in the women's is "science" - this is interesting that we don't see this same pattern appear in the texts. This can lead us to believe that women's philosophical works aren't perceived as scientific as mens works are.

Lastly, we will look at the word cloud venn diagram for the google search texts. We hypothesized that google would be the most subjective data source of them all.

We see a bunch of new patterns here. Of interst are the following negativewords that are in men's and not women's google search results such as "lazy" and "bitter".